Linguistic Resources and Topic Models for the Analysis of Persian Poems
نویسندگان
چکیده
This paper describes the usage of Natural Language Processing tools, mostly probabilistic topic modeling, to study semantics (word correlations) in a collection of Persian poems consisting of roughly 18k poems from 30 different poets. For this study, we put a lot of effort in the preprocessing and the development of a large scope lexicon supporting both modern and ancient Persian. In the analysis step, we obtained very interesting and meaningful results regarding the correlation between poets and topics, their evolution through time, as well as the correlation between the topics and the metre used in the poems. This work should thus provide valuable results to literature researchers, especially for those working on stylistics or comparative literature. 1 Context and Objectives The purpose of this work is to use Natural Language Processing (NLP) tools, among which probabilistic topic models (Buntine, 2002; Blei et al., 2003; Blei, 2012), to study word correlations in a special type of Persian poems called “Ghazal” (لزغ), one of the most popular Persian poem forms originating in 6th Arabic century. Ghazal is a poetic form consisting of rhythmic couplets with a rhyming refrain (see Figure 1). Each couplet consists of two phrases, called hemistichs. Syllables in all of the hemistichs of a given Ghazal follow the same pattern of heavy and light syllables. Such a pattern introduces a musical rhythm, called metre. Metre is one of the most important properties of Persian poems and the reason why usual Persian grammar rules can be violated in poems, especially the order of the parts of speech. There exist Figure 1: Elements of a typical Ghazal (by Hafez, calligraphed by K. Khoroush). Note that Persian is right to left in writing. about 300 metres in Persian poems, 270 of which are rare, the vast majority of poems composed only from 30 metres (Mojiry and Minaei-Bidgoli, 2008). Ghazal traditionally deals with just one subject, each couplet focusing on one idea. The words in a couplet are thus very correlated. However, depending on the rest of the couplets, the message of a couplet could often be interpreted differently due to the many literature techniques that can be found in Ghazals, e.g. metaphors, homonyms, personification, paradox, alliteration. For this study, we downloaded from the Ganjoor poems website1, with free permission to use, a Ghazal collection corresponding to 30 poets, from Hakim Sanai (1080) to Rahi Moayyeri (1968), with a total of 17, 939 Ghazals containing about 170, 000 couplets. The metres, as determined by experts (Shamisa, 2004), are also provided for most poems. 1http://ganjoor.net/.
منابع مشابه
Genre analysis of literature research article abstracts: A cross-linguistic, cross-cultural study
Following Swales’s (1981) works on genre analysis, studies on different sections of Research Articles (RAs) in various languages and fields abound; however, only scant attention has been directed toward abstracts written in Persian, and in the field of literature. Moreover, claims made by Lores (2004) regarding the correspondence of two types of abstracts with different ...
متن کاملConfirming the themes and interpretive unity of Ghazal poetry using topic models
We apply topic modeling to classifying the genre of Ghazal, a form common in Persian poetry. We show that a classifier based on automatically-generated topics exposes important information with only a small performance penalty: the top discriminative topics can be manually aligned with themes prevalent in the associated genres, as identified by scholars of literature. We also weigh in on a long...
متن کاملApplication of environmental-cultural features in the contemporary Persian literature of Mazandaran toward strengthening the local culture from the perspective of the poem of Asadollah Emadi, Ali Akbar Mahjorian and Khali Gheisari
Abstract Contemporary environmental poetry is a subjective kind of poetry with an organic totality in which tradition and modernism are challenged clearly. Environmental poetry is on the peak of the pyramid of local literature and is regarded as the background for classical poetry. Highlighting environmental ideas and creating such a room in the linguistic environment creates a specific piece ...
متن کاملA Linguistic Study on the Translation of Parvin E’tesami’s Poems into English Using Catford’s Category Shifts
The present study aimed to investigate the translation into English by Alaeddin Pazargadi of Parvin E’tesami’s poems; in particular, it attempted to analyze the structural elements such as verbs, nouns, pronouns, adjectives, adverbs, articles, conjunctions, prepositions, and interjections in them. Considering the relationship between Linguistics and Translation Studies, the theoretical framewor...
متن کاملSyntactic Structures and Rhetorical Functions of Electrical Engineering, Psychiatry, and Linguistics Research Article Titles in English and Persian: A Cross-linguistic and Cross-disciplinary Study
A research article (RA) title is the first and foremost feature that attracts the reader's attention, the feature from which she/he may decide whether the whole article is worth reading. The present study attempted to investigate syntactic structures and rhetorical functions of RA titles written in English and Persian and published in journals in three disciplines of Electrical Engineering, Psy...
متن کامل